NMAR: An R Package for Estimation under Nonignorable Nonresponse in Sample Surveys
\[\everymath{\displaystyle}\]
Tackling Nonignorable Missingness in Official Surveys with R
National Science Centre grant Poland (OPUS 20 grant no. 2020/39/B/HS4/00941)
dr Maciej Beręsewicz (Poznan University of Business and Economics), Igor Kołodziej (WUT - Faculty of Mathematics), Mateusz Iwaniuk (WUT - Faculty of Mathematics)
NMAR - Not Missing At Random
Motivation: The Challenge of Nonignorable Nonresponse (NMAR)
Missing Data Structure: We are interesting in estimating the mean of \(Y\) which is subject to missingness. Each observation \(y_i\) has its corresponding respone indicator \(\delta_i\):
Assuming that apart from salary, it is also affected by age
\(\delta \sim Y + \text{age}\)
The Case of the Missing Salaries
NMAR estimators are closer to true population mean compared to MAR estimators, naive sample mean is badly biased.
Exponential Tilting - Riddles et al (2016) (1/2)
Assumptions
Estimate density of observed data: \(f(y|X_1, \delta = 1)\), e.g., salary \(\sim\) experience + education
Use response model (logit/probit) to estimate \(\pi(x_{1i}, y_i; \phi) = P(\delta_i = 1 | X_{2i}, y_i)\) based on \(y\) (e.g. salary), \(X_2\) (e.g., age), and intercept
Profile out \(F\) and solve the estimating equations for \((\beta, W, \lambda_W, \lambda_x)\), plug back into the formula for \(p_i\).
Survey sampling?
Can be extended to survey sampling by incorporating the design weights into the likelihood and augumenting auxiliary constraints with stratum indicators.
Properties
Consistent if the response model \(w(y, x; \beta)\) is correctly specified.
Asymptotically normal (under regularity conditions).
NMAR Result
------------
Y mean: -1.048245 (0.049008)
Converged: TRUE
Variance method: bootstrap
Estimator: exponential_tilting
coef(res)
(Intercept) Y
0.5229152 -0.2465591
True Y mean: -1.0408
Est Y mean (NMAR): -1.0482 3σ interval: ( -1.1218 , -0.9747 σ= 0.0490 )
Naive Y mean (MAR): -1.1842
Improving R folder
CI/CD via Github Actions
Future Work and Challenges
Large-scale data handling
Analytical variance calculation
Extension to new estimation approaches
Extending estimation parameters beyond mean (quantiles, gini index)
Rewriting Exptilt EM and Likelihood Jacobians in C/C++ for improved performance
References
Qin, J., Leung, D., & Shao, J. (2002). Estimation With Survey Data Under Nonignorable Nonresponse or Informative Sampling. Journal of the American Statistical Association, 97(457), 193–200. https://doi.org/10.1198/016214502753479338
Minsun Kim Riddles, Jae Kwang Kim, Jongho Im, A Propensity-score-adjustment Method for Nonignorable Nonresponse, Journal of Survey Statistics and Methodology, Volume 4, Issue 2, June 2016, Pages 215–245, https://doi.org/10.1093/jssam/smv047